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ABSTRACT 

Motivation: Microarray experiments result in large 
scale data sets that require extensive mining and refin- 
ing to extract useful information. We have been devel- 
oping an efficient novel algorithm for nonmetric multi- 
dimensional scaling (nMDS) analysis for very large data 
sets as a maximally unsupervised data mining device. 
We wish to demonstrate its usefulness in the context 
of bioinformatics. In our motivation is also an aim 
to demonstrate that intrinsically nonlinear methods are 
generally advantageous in data mining. 
Results: The Pearson correlation distance measure is 
used to indicate the dissimilarity of the gene activities in 
transcriptional response of cell cycle-synchronized hu- 
man fibroblasts to serum [Iyer et al, Science 283, 83 
(1999)]. These dissimilarity data have been analyzed 
with our nMDS algorithm to produce an almost circu- 
lar arrangement of the genes. The temporal expression 
patterns of the genes rotate along this circular arrange- 
ment. If an appropriate preparation procedure may be 
applied to the original data set, linear methods such as 
the principal component analysis (PCA) could achieve 
reasonable results, but without data preprocessing lin- 
ear methods such as PCA cannot achieve a useful pic- 
ture. Furthermore, even with an appropriate data pre- 
processing, the outcomes of linear procedures are not 
as clearcut as those by nMDS without preprocessing. 
Availability: The fortran source code of the method 
used in this analysis ('pure nMDS') is available at 
http: / /www. granular.com/MDS / 
Contact: tag(9granular.com ; yoono@uiuc.edu 



genes). Our algorithm is maximally nonmetric in the 
sense that any introduction of intermediate metric co- 
ordinates obtained by monotone regression common to 
the conventional nMDS methods is avoided. 



Data compression is essentially a problem of lin- 
ear functional analysis as Donoho et al. (1998) stresses. 
In contrast, we believe data mining is essentially non- 
linear. There are linear algebraic methods such as the 
principal component analysis (PCA) for data mining, 
but it is expected that nonlinear methods are, in prin- 
ciple, more powerful. The present paper illustrates this 
point. Indeed, in our case PCA cannot find any com- 
prehensible temporal pattern in low dimensional spaces 
without an appropriate data preparation. 



SYSTEMS AND METHODS 
Systems to Analyze 

The gene activities in transcriptional response of cell 
cycle-synchronized human fibroblasts to serum reported 

by Iyer et al. (1999) are analyzed. The microarray data 

used in this analysis is available at http://genome-www.stanford.edu/serv. 



Other Possible Analysis Methods 

To extract interpretable patterns from microarray data, 
cluster and linear multivariate analyses seem to be the 
two major strategies. However, these methods may not 
be ideally suited for the purpose. 



INTRODUCTION 

Each DNA microarray experiment can give us infor- 
mation about the relative populations of mRNAs for 
thousands of genes. This implies that without exten- 
sive data mining it is often hard to recognize any useful 
information from the experimental results. In this pa- 
per we demonstrate that a nonmetric multidimensional 
scaling (nMDS) method can be a powerful unsupervised 
means to extract temporal expression patterns of genes. 
A data mining procedure may be useful, if it is flexible 
enough to incorporate any level of supervision, but we 
believe that the most basic feature required for any good 
data mining method is to be able to extract recogniz- 
able patterns reproducibly without supervision. In this 
sense our nMDS method is clearly demonstrated to be 
a useful means of data mining. 

We have been developing an efficient nMDS tech- 
nique for large data sets (Taguchi and Oono, 1999, 
Taguchi et al, 2001). The input is the rank order of 
(dis)similarities among the objects (in the present case, 



The cluster analysis seems to be the most pop- 
ular analytical method (Slonim, 2002). For example, 
the hierarchical clustering method seems to be popular 
(Eisen et al, 1998). Perhaps there are two fundamen- 
tal criticisms against clustering methods. Classifying 
the expression patterns as functions of time is often at- 
tempted by clustering methods (Spellman et al 1998). 
However, it is often the case that temporal gene expres- 
sion patterns vary rather continuously without natural 
gaps among various patterns; needless to say, cluster 
analysis is not a suitable method to classify continu- 
ously changing objects. The second criticism is that 
clustering methods cannot give any relation among re- 
sultant clusters other than 'genealogical relations' mim- 
icking similarities. Therefore, clustering is unsuitable 
for temporal pattern analysis. For example, if the re- 
sulting clustering is ((A,B),C), it is only by inspection 
that ABC or BAC is chosen as a natural temporal pat- 
tern or structure. Thus, to exhibit the temporal pattern 
rearranging the genes in each cluster by hand is needed. 
An example with such a procedure may be found in 
Spellman et al. (1998). 
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Perhaps the most popular linear multivariate anal- 
ysis method is the principal component analysis (PCA). 
The main idea is to choose a data- adapted basis set, and 
to make a subspace that can capture salient features of 
the original data set. In principle, the method could 
capture the temporal order in the gene expression pat- 
tern, but the dimension of the subspace may not be low 
even if the data are on a very low dimensional mani- 
fold. In short, the information compression capability 
of linear methods is generally feeble. This can be well 
illustrated by the data we wish to analyze in this paper: 
PCA cannot capture any clear temporal order as shown 
in Fig. 1, where the two dimensional space spanned by 
the first two principal components is shown there. The 
temporal expression pattern is hardly seen from the re- 
sult. 

Fig. 1 

However, apparently, Holter et al. (2000) demon- 
strated that the singular value decomposition (SVD; a 
linear method) is remarkably successful in extracting 
the characteristic modes. The reader must wonder why 
there is a difference between this result and the one due 
to PCA that is not successful. The secret is in the highly 
nonlinear 'polishing' of the original data (proposed by 
Eiscn et al. (1998)). However, the role of this nonlinear 
polishing must be considered carefully, because it can 
generate a spurious temporal behavior. Therefore, we 
relegate the comparison of these linear methods with 
data preprocessing and our nMDS to Appendix II. The 
salient conclusions are: 

(1) Linear methods such as PCA and SVD could per- 
haps achieve reasonable results, if a data preparation 
scheme is chosen correctly. However, best linear results 
are generally fairly inferior to nonlinear results. 

(2) The data preparation such as the 'polishing' used 
by Holter et al. (2000) could actually corrupt the origi- 
nal data (as illustrated in Appendix II), and should be 
avoided. 

An interesting proposal is to use the partial least 
squares (PLS) regression (Johansson et al, 2003). In 
this case one may assume a temporal order one wishes 
to extract (say, a sinusoidal change in time), and the 
original data are organized around the expected pat- 
tern. This is, so to speak, to analyze the data accord- 
ing to a certain prejudice. Although in the process of 
organization no supervision is needed, the pattern to 
be extracted (that is, the 'prejudice') must be presup- 
posed. Thus, even if it is unsupervised, it is hardly 
a foolproof method. Furthermore, if a clear objective 
pattern could be extractable by this method, certainly 
nMDS can achieve the same goal without any presup- 
posed pattern required by PLS. 



Metric multidimensional scaling methods (MDS) 
may also be used, but it depends on the definition of the 
dissimilarity. Therefore, unless the measure of dissimi- 
larity is (almost) dictated by the data or by the context 
of the data analysis, arbitrary elements arc introduced. 
For example, in the case of the microarray data there 
is no natural dissimilarity measure, so the metric that 
may capture detailed information could carry spurious 
information (disinformation, so to speak) as well. Also, 
if the dissimilarity data is with signs as in the case of 
correlation coefficients, an extra arbitrary factor inter- 
venes when they are converted to positive dissimilarity 
measures. A further disadvantage of metric MDS (and 
of linear methods) is that these methods are vulnerable 
to missing or grossly inaccurate data. 

The cluster analysis with the aid of self-organizing 
maps (SOM) is definitely a nonlinear data analysis method, 
but as we have seen in Kasturi et al. (2003) it is not par- 
ticularly suitable for extracting temporal order. Kasturi 
et al. comment that SOM is not particularly better than 
the ordinary cluster analysis. Furthermore, as can be 
seen from the fact that the use of a particular initial 
condition can be a methodological paper (Kanaya et al., 
2001), we must worry about the ad hoc initial-condition 
dependence of the results. 



ALGORITHM 

Basic idea of algorithm for nMDS 

The philosophy of nMDS (Shepard 1962a, 1962b, Kruskal 
1964a, 1964b) is to find a constellation in a certain space 
1Z of points representing the objects under study (genes 
in the present case) such that the pairwise distances d 
of the points in 1Z have the rank order in closest agree- 
ment with the rank order of the pairwise dissimilarities 
S of the corresponding objects that are given as the raw 
(or the original) input data. 

The conventional nMDS methods assume a certain 
intermediate pair distance d that is chosen as close as 
possible to d for a given object pair under the condition 
that it is monotone with respect to the given actual 
ordering of the dissimilarities S. The choice of d is not 
unique. The discrepancy between d and d is called the 
stress, and all the algorithms attempt to minimize it. 
Depending on the choice of d and on the interpretation 
of "as close as," different methods have been proposed 
(see, for example, Green et al. (1970), Cox and Cox 
(1994), and Borg and Grocncn (1997) ). The choice of 

d affects the outcome, d is required only by technical 
reasons for implementation of the basic nonmetric idea, 
so to be faithful to the original idea due to Shepard 
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(1962a, 1962b) we must compare S with d directly. Our 
motivation is to make an algorithm that is maximally 
nonmetric in the sense that we get rid of d. 



the bad points. Thus, we may estimate 



The basic idea of this 'purely nonmetric' algorithm 
is as follows (Taguchi and Oono, 1999, Taguchi et al., 
2001): in a metric space TZ (in this paper, _D-dimcnsional 
Euclidean space R D is used) N points representing the 
N objects are placed as an initial configuration. For 
this initial trial configuration we compute the pair dis- 
tances d(i,j), and then rank them according to their 
magnitudes. Comparing this ranking and that accord- 
ing to the dissimilarity S(i,j), we compute the 'force' 
that moves the points in TZ to reduce the discrepancies 
between these two rankings. After moving the points 
according to the 'forces', the new 'forces' are computed 
again, and the whole adjusting process of the object po- 
sitions in TZ is iterated until they converge sufficiently. 
The details are in Appendix I. 

nMDS can usually recover geometrical objects cor- 
rectly (up to scaling, orientation, and direction) when 
there are sufficiently many (say, > 30) objects. There- 
fore, nMDS is a versatile multivariate analysis method. 

It is desirable to have a criterion for convergence 
(analogous to the level of the stress in the conventional 
nMDS), or a measure of goodness of embedding. To 
this end let us recall the Kendall statistics K (p364, 
Hollander and Wolfe (1999)), 



K 



sign[(d Q 



where the summation is over all the pairs of dissimilar- 
ities (distances between objects) (a, (3) (i.e., a (also (3) 
denotes a pair of objects). Usually, this is used for a sta- 
tistical test to reject the null hypothesis that {d(i,j)} 
does not correlate with {5(i,j)} (The contribution of 
ties is negligible usually for large data set, so we do not 
pay any particular attention to tie data). 

Here, we use this value to estimate the number of 
the objects embedded correctly. If all the objects are 
correctly embedded, all the summands are 1. Thus, if 
N' objects are correctly embedded, and if we may as- 
sume that the rest are uncorrelated, then K is expected 
to be 

K >n'(n' -l)/2-0[N% 

where n' — N'(N' — l)/2, and the subtraction comes 
from the random sum of at most N c 2 ^2 of ±1. If the 
embedding is successful for the majority of the objects, 
then n! = 0[N 2 ], so we may ignore the contribution of 



Therefore, we adopt 100V2V2K/N% as an indicator 
of goodness of embedding. 

We must also discuss the initial configuration de- 
pendence of the result. Our algorithm is not free from 
the problem of local minima as all of the previously 
proposed algorithms for nMDS and as high dimensional 
nonlinear optimization problems in general. However, 
generally speaking, this dependence has only a very mi- 
nor effect. This will be checked for the fibroblast data 
(See below). 

RESULTS 

We have found that the fibroblast data may be embed- 
ded in a two dimensional space roughly as a ring (Fig. 
2). The estimated number of correctly embedded genes 
is about 480 among all the 517 genes (i.e., the goodness 
of embedding is more than 90%). (Also 516 out of 517 
genes have P < 0.005 confidence level (see Appendix I) 
). Thus, we conclude that the obtained configuration is 
sufficiently reliable. 

This is in remarkable contrast with the PCA re- 
sult mentioned already (Fig. 1). Further remarkable 
is the fact that this ring-like arrangement of the genes 
faithfully represents the temporal expression patterns 
of the genes as can be seen clearly from the rotation 
of the expression peaks around the ring (Fig. 3a). It 
is noteworthy that the angle coordinate assigned to the 
genes according to the result shown in Fig. 2 automati- 
cally gives the figure usually obtained through detailed 
Fourier analysis (Fig. 3b). These figures should be elo- 
quent enough to attest to the usefulness of nMDS, a 
nonlinear data mining method. 

Finally, to see the initial configuration dependence 
for the case of the fibroblast data we constructed two 2D 
embedding results starting from two different random 
initial configurations. With the aid of the Procrustean 
similarity transformation (Borg and Groenen, 1997) one 
result is fit to the other (notice that our procedure is 
nonmetric, so to compare two independent results, ap- 
propriate scales, orientations, etc., must be optimally 
chosen). Fig. 4 demonstrates the close agreements of 
x- and y-coordinates of the two results. As illustrated, 
the dependence on the initial conditions is very weak, 
and we may regard the embedded structure as a faithful 
representation of the information in the original data. 

Fig. 4 
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As has been clearly demonstrated, the 2D embed- 
ding is statistically natural and informative. Still the 
2D embedding is not perfect, so it is interesting to 
see what we might obtain by 'unfolding' the 2D data, 
adding one more axis. The unfolded result is shown 
in Fig. 5. Here, the angular coordinates <j> and 8 of 
the spherical coordinate system is determined by the 
icy-plane whose x-(resp., y-)axis is the first (resp., the 
second) principal component of the 3D embedded re- 
sult. The total contribution of these two components 
is 86%. We do not recognize any clear pattern other 
than that captured in the 2D space. Therefore, we may 
conclude that the 2D embedding result is sufficiently 



Fig. 5 reliable and informative. 



tual implementation of the algorithm, simpler forces are 
adopted than the one obtained from this potential as 
seen below). This A may be regarded as a counterpart 
of the stress in the conventional nMDS. As we will see 
later we can use quantities related to A to evaluate the 
confidence level of the resultant configuration. Thus, 
an important feature of our nMDS algorithm is that 
the optimization process is directly connected to a pro- 
cess that improves the confidence level of the resultant 
configuration. 

The 'pure nMDS' algorithm for TV objects may be 
described as follows: 



CONCLUSION 

We have demonstrated that the nMDS can be a useful 
tool for data mining. It is unsupervised, and perhaps 
maximally nonlinear. Our algorithm is probably the 
simplest among the nonmetric MDS algorithms and is 
efficient enough to enable the analysis of a few thousand 
objects with a desktop PC. 

The NMDS algorithm works on the binary rela- 
tions among the objects, so if there are N objects, com- 
putational complexity is of order iV 2 at least. There- 
fore, it is far slower than linear methods such as PC A, 
although our nonlinear algorithm is practically fast enou; 
because we have used for this work a small notebook PC 
(Mobile Celeron 650MHz cpu with 256MB RAM). As 
pointed out and as has been illustrated, with an appro- 
priate data preprocessing a certain linear method could 
give us a reasonable result with less computational ef- 
forts. Although in this paper we have not made any 
particular effort to reduce computational requirements, 
a practical way to use nMDS may be to prepare an 
initial configuration by a linear method with an appro- 
priate data preprocessing method that is verified to be 
consistent with the full nMDS results. 

APPENDIX I 

'Purely' non-metric MDS algorithm 

Suppose d(i,j) is the distance between objects i and j 
in 1Z. Let the ranking of 5(i 7 j) among all the input 
dissimilarity data be n and that of d(i 7 j) among all the 
distances between embedded pairs be T n . If n > T n 
(resp., n < T n ), we wish to 'push' the pair i and j far- 
ther apart (resp., closer) in 1Z. Intuitively speaking, to 
this end we introduce an 'overdamped dynamics' of the 
points in 1Z driven by the following potential function 

A^]T(T n -n) 2 . 

Here, the summation is over all the pairs (in the ac- 



1. Dissimilarities Sij — 1, • • ■ , N) for N objects 
are given. Order them as follows: 

■ ■ ■ < < Ski < ■ ■ ■ ■ 

2. Put N points randomly in TZ as an initial config- 
uration. 

3. Scale the position vectors in 1Z such as ySi l r il 2 = 
1, where is the current position of object i in 
1Z. 

4. Compute d^ for all object pairs in 1Z, and 
then order them as 

• • • < dij < du < ■ ■ • ■ 

5. Suppose Sij is the mth largest in the ordering in 
n and d^ is the T m th largest in the ordering in 
0] Assign dj = T m — m. Calculate the following 
displacement vector for i: 

x - n Vl ~ r i 

OTi — s y *^ij ] T i 

j \i-i-rj\ 

where s — 0.1 x A~ 3 typically, and update — > 
r l + 5r % . 

6. Return to 3, and continue until the "potential en- 
ergy" becomes sufficiently small. 

The reader may worry about the handling of tie 
data. Generally speaking, for a large data set the frac- 
tion of tie relations is not significant; furthermore, if the 
result depends on the handling schemes of tie data, the 
result is unreliable anyway. Therefore, we do not pay 
any particular attention to the tie data problem. 

In the above algorithm, s is a constant value. In 
practice, we could choose an appropriate schedule to 
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vary s as is often done in optimization processes. In 
this paper, for simplicity, we do not attempt such a fine 
tuning. 

In the above algorithm, we can deal with asymmet- 
ric data as well, i.e., % ^ Sji if we compare Sij with dij 
while Sji with dji(= dij). Needless to say, if the mis- 
match between Sij and Sji is large, then representing 
the pair by a pair of points in a metric space is ques- 
tionable. Therefore, we will not discuss this problem 
any further in this paper. 

Goodness of embedding 

In the text we have already discussed the effective num- 
ber of correctly embedded objects as a measure of 'global 
goodness of embedding.' This measure, however, can- 
not tell us the embedding quality of each object. It 
is often the case that the majority of objects are em- 
bedded well even without sensitive dependence on the 
initial conditions, but there are a few objects that con- 
sistently refuse to be embedded stably. To judge the 
quality of embedding for each object j we define 

A(j) = J2[T n{j) (j)-n(j)] 2 , 

Here, n(j) is the rank order of S(i,j) among N — 1 
pairs for a given j, and T n ^(j) is the rank order 
of d(i,j) among N — 1 pairs (i, j) for the same j. 

A(j) can be regarded as a statistical variable for 
the relative position of the j-th object with respect to 
the remaining objects (Lehmann 1975). We can esti- 
mate the probability P(e) of A(j) < e with the null hy- 
pothesis that the rank ordering of dij (i e {1, 2, • • • , 7V}\ 
{j}) is totally random with respect to the rank order- 
ing of Sij (i e {1, 2, • • • , N} \ {j}). If N is sufficiently 
large, then A(j) obeys the normal distribution with 
mean (M 3 - M) /6 and variance M 2 (M + 1 ) 2 (M - 1 ) /36 , 
where M = N — 1. For smaller N there is a table for 
P(e) (Lehmann 1975). Thus, we can test the null hy- 
pothesis with a given confidence level for j-th object. 

APPENDIX II 

Limitations and capabilities of linear methods 

The limitations and capabilities of PCA with and with- 
out data preprocessing are illustrated in this appendix. 
There is no fundamental difference between PCA and 
SVD. We consider the following artificial data {s gt }, 
where g (= 1, • • • , 517) denote genes and t (= 1, • • • , 11) 
the observation times: 
[Data set 1] 

Sgt = C g cos(27rf/ll + 2irS g ). 



[Data set 2] 

s 2 gt = cxp(s^). 

[Data set 3] 

s 3 gt = C g i cos(27rf/ll + 2TrS lg ) 

+C g2 exp[cos(27rf/ll + 2tt(5 29 )] 
+ C g3 / cos(27rf/ll + 27n5 3s ). 

In the above, C g ,S g ,Cig,Sig, (i — 1,2,3) arc uniform 
random numbers in [0, 1]. That is, Data set 1 is a set of 
sinusoidal waves with random amplitudes and phases, 
Data set 2 is the nonlinearly distorted Data set 1, and 
Data set 3 is a set of periodic functions that are very 
different from simple oscillatory behaviors. 

These data sets are analyzed by the following meth- 
ods. 

Method 1: PCA with the preprocessing used by Holter 
et al. (2000). The preprocessing procedure is as follows: 
step 1: Subtract the average, 

s 'gt — s gt — ( s gt)g,t, 

where (•) fl ,t is the average over all genes and experi- 
ments, 

y *• 

\')g,t = 7- 

step 2: (Column normalization) Normalize the data as 

„ _ S gt 

s gt - r 

step 3: (Row normalization) Normalize the data as 

s" 

Repeat these steps until the following condition is sat- 
isfied, 

yj([s9t-s%]\ t < o.oi. 

From the resultant s gt correlation matrix Matr.(Cortt') 
is constructed, and then PCA is performed. 

Method 2: PCA with the preprocessing so that J2 t s gt = 
and J2t s gt = 1 f° r au 9- C* 1 course, no iteration is 
needed for this preprocessing. From the resultant s gt 



G 



correlation matrix Matr.{Cor U ') is constructed, and 
then PCA is performed. 

Method 3: nMDS as done in the text. That is, the 
negative of the correlation coefficient Cor gg > is used 
as the dissimilarity and nMDS is applied straightfor- 
wardly. Needless to say, no preprocessing of data is 
Fig. 6 needed. 

The results are exhibited in Figure 6. The conclu- 
sions may be: 

(1) For Data set 1, any method will do. 

(2) For Data set 2, the procedure recommended by 
Holter et al. (2000) fails, although ironically simpler 
Method 2 still works very well. If the amplitude C 
is distributed in [0, 5] instead of [0, 1] (that is, the ex- 
tent of the nonlinear distortion is increased), Method 
2 becomes inferior to Method 3, but still Method 2 is 
adequate. 

(3) For Data set 3, even Method 2 fails. nMDS (Method 
3) still exhibits a ring-like structure. The method rec- 
ommended by Holter et al. (2000) is obviously out of 
question. 
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Thus, we may conclude that nMDS is a versatile 
and all around data mining method for analyzing peri- 
odic temporal data. Furthermore, we can point out that 
the preprocessing method in Method 1 should not be 
used because it could severely distort the original data 
(as may have been expected from the figures). Suppose 
there are N genes and 4 time points. Consider the fol- 
lowing example (for the counterexample sake). The first 
gene has (a, b, —b, —a) (a > b > 0), and the remaining 
genes are all give by (1,0,0,-1). The N x 4 matrix 
made from these vectors is polished by an iterative row 
and column vector normalization procedure. If N is 
sufficiently large, the first row converges to (0, 1, —1, 0) 
and the rest to (1, 0, 0,-1), independent of a and b. If b 
is small, then all the vectors should behave almost the 
same way, but after polishing the out-of-phase compo- 
nent in the discrepancy between the first row and the 
rest is dramatically enhanced, resulting in a spurious 
out of phase temporal behavior. Although the preced- 
ing exercise is trivial, the result warns us the danger of 
using the so-called polishing. 
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Figure legends 



Figure 1: PCA results using correlation coefficient matrix. The first two principal components are used as the 
horizontal and vertical axis, respectively (the cumulative proportion is 70 %). Genes whose experimental values 
are larger than 3.2 are drawn using filled boxes, otherwise drawn using small dots (the corresponding color figure 
is available online). From the top the time is, respectively, 15 min, 30 min, 1 hr, 2 hr, 4 hr, 6 hr, 8 hr, 12 hr, 16 
hr, 20 hr, 24 hr. 

Figure 2. Two dimensional embedding result obtained by nMDS. 

Figure 3: (a) Temporal patterns of gene expression levels visualized with the aid of nMDS. Colors indicate relative 
intensity of experimental values normalized so that ^ t s gt = and J2t s lt ~ 1 where s g t is experimental variable 
of gth genes at time t. (red > 1.6, yellow > 1.2, green > 0.8, pale blue > 0.4, gray < 0.4). Time sequences are 
the same as explained at Fig. 1 . (b) Gene expression data as a function of the angle measured from the vertical 
axis in (a). The horizontal axis corresponds to t. The color convention is the same as in (a). 

Figure 4 Comparison between the nMDS embedding results with two different initial configurations after Pro- 
crustean similarity transformation. The horizontal (resp., vertical) coordinates are compared in the left (resp., 
right) figure. In each figure x-axis corresponds to the result from one initial condition and the j/-axis the other. 

Figure 5: 3D unfolding of the temporal pattern of gene expression level with the aid of nMDS (3D). Experimental 
values are normalized as explained in Fig. 3. Genes whose experimental values are larger than 1.6 are drawn 
using filled boxes, otherwise drawn using small dots (the corresponding color figure is available online). The 
horizontal (resp., vertical) axis represents <p (resp., 9). See the text for detail. 

Figure 6 Comparison of linear and nonlinear methods. 

Method 1: PCA with polishing (Holter et al. 2000); Method 2: PCA with normalization; Method 3: 2D space 
embedding with the aid of nMDS. See the text for Data sets and Methods. For Methods 1 and 2, horizontal and 
vertical axes are the first and second principal components, respectively, and the percentages describe cumulative 
proportions. For Method 3, the percentages are the indicators of goodness defined in the text. 



9 



Figure legends for online only color figures: 



Figure 1 (Color; online only): PC A results using correlation coefficient matrix. The first two principal components 
are used as the horizontal and vertical axis, respectively (the cumulative proportion is 70 %). Colors indicate 
relative intensity of experimental values (red > 3.2, yellow > 2.4, green > 1.6, pale blue > 0.8, gray < 0.8). From 
the top the time is, respectively, 15 min., 30 min., 1 hr, 2 hr, 4 hr, 6 hr, 8 hr, 12 hr, 16 hr, 20 hr, 24 hr. 

Figure 5 (Color: online only): 3D unfolding of the temporal pattern of gene expression level with the aid of nMDS 
(3D). The color convention is the same as explained in Figure 3. The horizontal (resp., vertical) axis represents 
<f> (resp., 9). See the text for detail. 
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